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An  Evolutionary  Perspective  of  Software  Engineering 
Research  Through  Co-Word  Analysis 

o*2!raCt:  stu<ty  aPP['es  various  tools,  techniques,  and  methods  that  the 

tware  Engineering  Institute  is  evaluating  for  analyzing  information  beinn 
produced  at  a  ve,y  rapid  rate  in  the  discipliniboth  in  praSfce  an™ ^research9 
The  focus  here  ,s  on  mapping  the  evolution  of  the  research  literature  as  ^ a 

SSpLesChSotereeen0ftWare  en?ineerin9  and  distinguish  it  from  other 
disciplines.  Software  engineering  is  a  term  often  tn 

programming-in-the-large  activities  Yet,  any  precise  empirical  characteriza  ta 
o  its  conceptual  contours  and  their  evolution  is  lacking  In  this  studv  a  lamp 

methoctology  ^^ihes^MocLhons^amor^'publicahon^escriptore^tndexInq 
terms)  from  the  Computing  Classification  ^temrtpSSS” 
terms  that  reveal  patterns  of  associations.The  Tesults  suggest  Citato 
thmsts' *r!!T  ,n  S0,tware  en9ine6d"9  tsmain  constant  bufwi  h  chSgted 

predominate  for  the  most  recent  time  period  covered  (1991  1994^0^0^ 
onented  methods  and  user  interfaces  are'identiflable  as  cental  teern’es  '  '' 


1  Introduction 


1 .1  Motivation  for  This  Empirical  Study 

fnrf  som!"9  diSCiPlinaS b0th  research  and  prac,ioe  9enerete  information  at  a  very  rapid  rate 
and  software  engineering  is  no  exception.  The  Software  Engineering  Institute  is  evaluating 

dtecMne  of*  ,aahniqUeS-and  methods  ,hat  aid  id  "waging  this  information  explosion  for  the 
discipline  of  software  engineering,  the  Software  Engineering  Institute  itself,  as  well  as  oraani 

zaeons  doing  software-intensive  work.  This  study  focuses  on  ,he  discipline  of  ^are  end  - 
neenng  as  a  whole,  especially  with  respect  to  research  literature  being  produced  in  the  fieM 
.  is  importan  ,o  note,  however,  fha.  many  of  the  same  fools, 

taring  information  and  for  detecting  patterns  and  trends  at  the  global  research  level  are 

also  applicable  at  the  local  organizational  level.  researcn  level  are 

1 .2  Questions  Addressed,  Tools  Employed 

Interesting  discussions  about  the  nature  and  status  of  software  engineering  have  occurred  in 
ent  years.We  thought  it  would  be  interesting  to  explore  this  issue  by  letting  the  research  in 
s^vare  engineering  describe  itself  through  the  medium  of  the  informaUon  management  too  " 
we  had  been  investigating.  We  formulated  the  questions  as  the  following: 


Is  software  engineering  a  child  of  computer  science,  computer  engineering,  or  information 
systems,  or  is  it  an  intersecting-but  relatively  independent-discipline?  Is  software  engineering 
changing  with  respect  to  its  primary  foci? 

These  are  important  issues  in  industry  and  academe  because  they  address  research,  appli¬ 
cation,  and  curriculum  concerns.  These  topics  have  been  discussed  by  many  professionals 
[Ford  89],  [Denning  92],  [Dijkstra  89],  [Gibbs  91],  [Gibbs  89],  [Gries  91],  [Parnas  90],  [Parnas 

a™/?™"9  89]’  [ShaW  9°]’  [Jackson  94l-  [B^oks  87],  [Coulter  94],  In  addition,  a  special 
CM/IEEE  Computer  Society  task  force  is  now  commissioned  to  consider  the  matter  [Bucklev 
93],  [Boehm  94].  y 

While  discussions  about  computer  science/software  engineering  are  useful,  empirical  studies 
of  the  issue  are  also  needed.  Such  studies  require  a  carefully  considered  methodology  and 
accompanying  data  sets.  The  methodology  we  have  chosen  is  based  on  co-word  analysis 
[Callon  86],  [Callon  91],  [Courtial  89],  [Law  92],  [Whittaker  89].  Co-word  analysis  reveals  pat¬ 
terns  of  associations  among  terms  by  measuring  and  representing  the  associations  of  terms 
describing  technical  publications  or  other  technical  texts. 

This  study  uses  co-word  analysis  to  provide  insight  into  the  nature  of  software  engineering. 
Our  hypothesis  is  that  the  identified  patterns  of  term  associations  are  maps  of  the  conceptual 
space  of  software  engineering  and  its  relations  to  other  computing  fields.  Further,  a  series  of 

such  maps  constructed  for  different  time  periods  suggests  a  trace  of  the  changes  in  this  con¬ 
ceptual  space. 

The  technique  is  applied  to  a  very  large  cross-section  of  published  text  (1982-1994)  in  the 
computing  field  that  is  indexed  with  descriptors  from  the  well-known  Computing  Classification 

(CCS)'  Th'S  ind6Xed  t6Xt  COmeS  from  the  Association  for  Computing  Machinery’s 
(ACM)  Gu,de  to  Computing  Literature  (GUIDE),  which  covers  an  ACM  publications  database. 

Through  professional  indexers,  GUIDE  annually  covers  over  20,000  items  by  descriptors  from 
th©  OOS. 

CCS  is  a  carefully  designed  taxonomy  that  has  existed  since  1982  [Sammet  82]  and  it  has 
been  updated  three  times  [Sammet  83],  [Sammet  87],  [Coulter  91],  Because  CCS  classifies 
publications  over  the  breadth  of  computing,  it  allows  us  to  investigate  trends  and  the  position 
of  software  engineering  in  the  larger  computing  context. 


i. 


Until  1 996,  the  Computing  Classification  System  (CCS) 
tern  (CRCS). 


was  called  the  Computing  Reviews  Classification  Sys- 


Descriptors  selected  from  CCS  are  distinguished  from  keywords  freely  chosen  by  the  author  Onlv  CCS  de 
scriptors  were  used  in  this  study.  The  issue  of  descriptors  selected  by  professional  indexers  aVonnoiL  it 
free  selection  of  keywords  by  the  authors,  is  important  here.  While  both  may  have  merits,  we  believe  it  is  useful 
to  study  a  fixed  system  that  imposes  a  common  nomenclature  across  all  computing.  Professional  indexers  ex- 
penenced  inusing  the  CCS  assure  standard  application  of  that  taxonomy.  Law  and  Whittaker  [Law  921  [Whit 
taker  89]  addressed  these  issues  extensively.  lLaw  [Whlt' 
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1 .3  Intended  Audiences  -Alternative  Routes  Through  the  Paper 

There  are  several  different  kinds  of  audiences  for  this  paper.  It  can  be  read  on  three  levels: 

1 .  At  one  level  it  is  an  attempt  to  characterize  software  engineering  as  a  disci¬ 

pline,  both  in  its  own  right  and  in  its  important  differences  from  other  related 
disciplines.  No  particular  background  is  required,  though  some  familiarity  with 
various  issues  in  software  engineering  research  or  practice  is  necessary  to 
appreciate  the  conclusions  reached.  y 

2.  Some  readers  may  be  just  as  interested  in  finding  out  how  useful  the  tools 
techniques  and  methods  are  in  answering  the  kinds  of  questions  posed  by 
the  study;  they  may  have  an  interest  in  just  how  accurate  and  informative 
such  approaches  are  at  summarizing  large  amounts  of  information  and  de¬ 
tecting  patterns  and  trends  in  it.  A  willingness  to  wade  through  some  descrip¬ 
tions  of  information  retrieval  and  statistical  techniques  is  required  but  these 
descriptions  are  self-contained. 

3.  A  third  group  of  readers  might  be  interested  in  evaluating  these  tools  tech¬ 
niques,  and  methods  to  gain  an  understanding  of  how  they  might  be  applied 
in  their  own  work.  Here  some  familiarity  with  current  work  in  information  re¬ 
trieval  and  computational  linguistics  would  be  useful,  though  not  required  for 
anyone  doing  technical  work  in  software  engineering. 

Following  the  introduction,  a  discussion  of  the  data  and  its  sources  begins  the  main  body  of 
the  paper.  This  includes  the  descriptors  and  codes  used  for  indexing  the  software  engineer¬ 
ing  documents,  the  sources  of  the  documents  indexed,  and  the  numbers  of  documents  cov¬ 
ered  in  each  of  the  respective  time  periods.  This  discussion  of  what  is  analyzed  is  followed  bv 
a  discussion  in  the  next  section  of  how  it  is  analyzed.  In  particular,  the  metric  for  determining 
co-occurrence  strength  between  descriptors  associated  with  the  same  documents  is 
described.  In  the  same  section,  the  algorithm  used  to  generate  networks  of  co-occurrinq 
descriptors  is  detailed.  Example  networks  generated  from  descriptors  of  the  software  engi¬ 
neering  literature  are  presented.  ^ 

Next  comes  a  discussion  of  the  methods  used  for  interpreting  networks:  in  particular,  the 
method  used  for  naming  them  and  a  more  technical  discussion  of  how  complexity  of  net¬ 
works  is  measured.  These  two  discussions  of  methods  are  each  followed  by  presentations  of 
findings  that  list,  analyze,  and  describe  the  networks  found  in  the  time  periods  covered  A 
more  general  discussion  of  types  of  networks  comes  next;  it  focuses  on  two  of  their  distin¬ 
guishing  factors  called  centrality  and  density.  Examples  from  analyses  of  the  current  data 
are  provided  and  implications  discussed. 

Me'hods  for  identifying  relationships  among  networks  within  a  time  period  are  pursued  next 
followed  by  a  discuss™  of  what  was  found  when  these  methods  were  applied  to  the  neb 
works  generated.  Then  an  analysis  of  the  findings  from  each  time  period  are  compared  and 
contrasted  in  order  to  determine  how  the  discipline  of  software  engineering  has  evolved  over 

™.e'J  r  eV°  ,  °',he  pr0£,rammln9  Ada  is  presented  as  an  example.  Finally,  the 

ast  section  before  the  conclusion  discusses  the  distributions  of  categories  of  descriptors 

h°aTd kTnot0"1'  °f  V'eW  °f  ,h°Se  ,ha‘  made  *  im°  ,h®  co-°ccurre"'te  networks  against  those 
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To  assist  the  reader,  some  sections  will  have  a  heading  for  Methodology  and  for  Findings. 
These  sections  will  expand  on  the  research  methods  and  on  the  software  engineering  specif¬ 
ics,  respectively. 
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2  The  Data  and  Its  Descriptors 

Co-word  methodology  operates  on  indexed  textual  data.  This  chapter  describes  these  two 
components  for  the  study.  Here,  index  terms  used  are  taken  directly  from  a  standard  taxono- 

ZJ T 1  fa8"!!1  applications  at  lhe  SE|.  so«ware  is  used  to  generate  index  terms  directly 
trom  the  studied  corpora.  y 

GUIDE  review  and  indexes  a  large  number  of  publications  across  the  spectrum  of  computing 
Publications  reviewed  generally  include  books,  book  chapters,  journals,  proceedings  trade 
magazines  and  other  applied  sources,  and  occasionally  other  media  such  as  videotaped  ma¬ 
terial.  For  the  latest  list  of  publications  received,  see  the  November  1995  issue  of  Computina 

RevieWs  [CR  95].  In  addition,  GUIDE  indexes  many  proceedings  and  articles  from pZed 
ings. 

WhC|=S  ,US6S  3  classi,ication  syste">-  Any  descriptors  semantically  below  the 

four  leZkl  Th  nevertha  ass  9rouped  a,th|s  level  (note  that  all  sections  of  the  tree  do  not  have 
four  levels).  The  major  CCS  categories  are  listed  below: 

A-General  Literature  G-Mathematics  of  Computing 

B-Hardware  H-information  Systems 

C-Computer  Systems  Organization  1-Computing  Methodologies 

Software  J-Computer  Applications 

^  ^ata  K-Computing  Milieux 

F-Theory  of  Computation 

The  full  CCS  is  described  in  the  January  1 996  issue  of  Computing  Reviews  [CR  96], 

A  complete  rendition  of  the  software  engineering  section  of  the  taxonomy,  D.2,  follows  Super- 

,r  sir ,:,:enscrip,or  is  new  besinnin9  wi,h  ,he  ^ 


3. 


In  some  cases,  a  descriptor  may  appear  in  a  document 
sion  of  CCS.  Updates  to  CCS  are  always  announced  in 
in  use  by  the  indexers  for  a  short  time  prior  to  its  official 
be  indexed  until  after  a  revision  of  CCS  is  in  place,  so 
case,  these  occurrences  are  not  common. 


indexed  before  the  official  adoption  of  an  updated  ver- 
January;  however,  the  revision  may  be  completed  and 
release.  Some  documents  written  in  one  year  may  not 
the  older  version  of  CCS  is  no  longer  applied.  In  any 
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D.2  SOFTWARE  ENGINEERING 
D.2.0  General 

Protection  mechanisms 
Standards 

D.2. 1  Requirements/Specifications 
Languages 
Methodologies 
Tools 

D.2. 2  Tools  and  Techniques 

Computer-aided  software  engineering  (CASE) 91 
Decision  table  7 

Flow  charts 

Modules  and  interfaces 
Petri  nets  91 

Programmer  workbench 
Software  libraries 
Structured  programming 
Top-down  programming 
User  interfaces 
D.2. 3  Coding 

Pretty  printers 
Program  editors 
Reentrant  code 
Standards 

D.2. 4  Program  Verification 
Assertion  checkers 
Correctness  proofs 
Reliability 

D.2. 5  Testing  and  Debugging 

Code  inspections  and  walk-throughs  91 

Debugging  aids 

Diagnostics 

Dumps 

Error  handling  and  recovery 
Symbolic  execution 
Test  data  generators 
T  racing 


D.2. 6  Programming  Environments 
Interactive  87 

D.2. 7  Distribution  and  Maintenance 
Corrections 
Documentation 
Enhancement 
Extensibility 
Portability 
Restructuring 
Version  control 
D.2.8  Metrics 


Performance  measures 
Software  science 
D.2.9  Management 
Copyrights 
Cost  estimation 
Life  cycle 
Productivity 
Programming  teams 
Software  configuration  management 
Software  quality  assurance 
Time  estimation  91 
D.2. 10  Design  87 

Methodologies  87 
Representation  87 
D.2.m  Miscellaneous 

Rapid  prototyping  83 
Reusable  software  83 


An  item  is  almost  always  classified  by  multiple  CCS  descriptors.  Even  though  there  are  up  to 
four  CCS  levels,  an  item  can  be  classified  at  any  level  that  is  appropriate;  all  branches  of  CCS 
do  not  have  four  levels.  CCS  does  not  include  names  of  systems  and  languages  (Unix,  Ada, 
Windows  etc.);  instead,  they  are  called  implicit  subject  descriptors  and  can  be  used  by  index¬ 
ers  as  needed.  As  we  will  see,  their  inclusion  is  common  and  often  significant. 

We  obtained  descriptors  for  all  items  indexed  in  GUIDE  that  had  at  least  one  descriptor  in  the 
D.2  category.  Hence,  the  study  admits  descriptors  from  throughout  CCS  as  long  as  an  item 
has  at  least  one  D.2  descriptor.  This  selection  allows  us  to  examine  interactions  of  software 
engineering  nodes  with  other  nodes  in  CCS.  We  could  have  refined  this  study  by  selecting 
more  specific  CCS  descriptors  (such  as  how  Software  Engineering  [D.2]  interacts  with  Pro¬ 
gramming  Techniques  [D.1],4  for  example).  However,  this  study  focuses  on  the  larger  ques¬ 
tion  of  how  software  engineering  interacts  with  computing  as  a  whole,  i.e.,  on  the  interactions 
of  software  engineering  with  all  other  nodes  of  the  CCS.  The  data  we  received  reflect  the 
March  1995  update  to  the  GUIDE  database.5  Table  1  shows  the  numbers  of  indexed  docu- 


We  show  the  corresponding  CCS  node  after  a  descriptor  for  context  when  needed. 
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ments  that  we  analyzed  for  the  years  1982-1994. 


Table  1:  Distribution  of  Documents  by  Year 


Year 

Number  of  Documents 

1982 

81 

1983 

33 

1984 

211 

1985 

367 

1986 

1,027 

1987 

1,479 

1988 

2,329 

1989 

1,928 

1990 

1,914 

1991 

1,738 

1992 

2,016 

1993 

2,159 

1994 

1,612 

Total 

16,691 

The  total  isl  6,691  documents.  As  is  evident,  the  number  of  documents  was  small  until  1986 
The  16,691  items  were  indexed  by  a  total  of  57,727  descriptors  (a  mean  of  3.46  per  item). 

For  analysis,  we  grouped  the  data  for  the  years  1 982-1 986, 1 987-1990,  and  1991-1 994  This 
separates  the  sparse  years  1982-1986  from  the  others,  gives  approximately  equal  numbers 
of  documents  in  the  latter  two  periods,  and  provides  breaks  when  CCS  was  updated  so  we  do 

not  confuse  new  descriptors  across  periods.  Data  for  documents,  descriptors,  and  their  ratios 
tor  the  time  periods  are  shown  in  Table  2. 


Table  2:  Documents  and  Descriptors  per  Time  Period 


Time  Period 

Documents 

Descriptors 

Descriptor/Document 

Ratio 

1982-1986 

1,646 

5,645 

3.43 

1987-1990 

7,650 

28,471 

3.72 

1991-1994 

7,395 

23,611 

3.19 
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3  The  Metric  and  the  Algorithm 

While  some  CCS-based  results  are  presented  here  to  demonstrate  the  methodoloqv  this 
chapter  focuses  on  underlying  theory  ol  co-word  analysis.  All  readers  need  the  material  this 

Co-word  analysis  enables  the  structuring  of  data  at  various  levels  of  analysis:  (1)  as  networks 
of  tote  and  nodes  (nodes  in  our  networks  contain  descriptors  that  index  documents  )•  (2)  as 
distributions  of  networks  called  super  networks-,  and  (3)  as  transformations  of  networks  and 
super  networks  overtime  periods.  These  structures  and  changing  relationships  provide  a  ba¬ 
sis  for  tracing  the  evolution  of  software  engineering. 

Co-word  analysis  reduces  a  large  space  of  related  terms  to  multiple  related  smaller  spaces 

In  the  7,  T  h  c0mprehend'  but  ,hat  also  indicate  actual  partitions  of  interrelated  concepts 
in  the  literature  being  analyzed.  This  analysis  requires  an  association  measure  and  an  It 
nthm  for  searching  through  the  space.  measure  and  an  algo- 

The  analysis  is  designed  to  identify  areas  of  strong  focus  that  interrelate  This  scheme  allows 
us  to  construct  a  mosaic  of  software  engineering  topics. 

3.1  The  Metric 

rr  Ted  ex,ens,veiy  ica,ion  ^  [ca"°n  ** 

J.  L  y<|,  [Whittaker  89].  The  basic  metric  most  suitable  for  this  study  is  Strenath  s  fcaiiPd 
Equivalence  Index  by  Callon).  It  is  described  as  follows:  9  ( 

umen?  T  je0'5' ' and  ’’  C°'°0CUr  "  'hey  ^  US6d  '°9e,her  ln  1,16  classification  of  a  single  doc¬ 
ument  Take  a  corpus  consisting  of  N  documents.  Each  document  Is  indexed  bv  a  set  of 

o,Tc ZTZ  d0CUmen,S-  Le*  ba  -  riatadarofoccumences 

c  be  the  n  im^r’  ?  *  18  USed  ,0r  indexin9  documents  in  the  corpus.  Let 

tcri^'°CCUrrenCeS  °'  deSCr'P,0rS  '  and  '  (tha  —  °<  documents  in- 

Then  Strength  s  of  association  between  descriptors  /  and  /  is  given  by  the  expression: 


S(c,,  Cj,  C/j)  =  ^-,0<S<1  • 

G  icj 

Two  descriptors  that  appear  many  times  in  isolation  but  only  a  few  times  tooether  will  vield  a 
loo,  coTcc'urrenct  ^  l6SS  0,te"  alb"a  ba<  have  a  higher 

3.2  The  Algorithm 

The  algorithm  makes  two  passes  through  the  data  to  produce  pair-wise  connections  of  de 

muT hTlTdT*3; A  "etW°rk  COnSlS,S  0f  n0des  (descnptors)  connected  by  links  Each  node 
must  be  linked  to  at  least  one  other  node  in  a  network.  The  first  pass  (Pass-1)  generates  th! 

primary  associations  among  descriptors;  these  descriptors  are  called  internal  nodes  and  the 
corresponding  links  are  called  interne,  links.  A  second  pass  (Pass-2,  generis",' “een 

C  M  U/S  E I  -95 -T  R  -  0 19  - - - - 
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Pass-1  nodes  across  networks,  thereby  forming  associations  among  completed  networks. 
Pass-2  nodes  and  links  are  called  external  ones. 

Pass-1  builds  networks  that  can  identify  areas  of  strong  focus;  Pass-2  can  identify  descriptors 
that  associate  in  more  than  one  network  and  thereby  indicate  pervasive  issues.  This  pattern 
of  networks  yields  a  mosaic  of  the  data  being  analyzed. 

3.2.1  Pass-1 

During  Pass-1 ,  the  link  that  has  the  highest  strength  is  selected  first.  These  linked  nodes  be¬ 
come  the  starting  points  for  the  first  network.  Other  links  and  their  corresponding  nodes  are 
then  determined  breadth-first. 

Figure  1  illustrates  this  process  for  a  1991-1994  Pass-1  network.  This  figure  displays  the  net¬ 
work  connections  as  a  map.6  This  network,  named  User  Interfaces,  is  the  first  one  created  by 
the  co-word  algorithm  for  1991-1994  data.  The  links  are  numbered  in  the  order  formed. 

All  nodes  contained  in  the  resulting  Pass-1  network  are  removed  from  consideration  for  inclu¬ 
sion  in  subsequent  Pass-1  networks.  The  next  network  then  starts  with  the  link  of  highest  S 
value  of  the  remaining  links  (i.e.,  ones  not  containing  nodes  from  any  previous  network). 

This  Pass-1  strategy  does  not  necessarily  (or  usually)  yield  S  strengths  in  strict  descending 
order,  either  within  individual  networks  or  among  sequentially  generated  networks  with  re¬ 
spect  to  the  sum  or  average  of  S  strengths.  The  first  network  becomes  the  first  network  only 
because  it  starts  with  the  highest  link;  the  second  network  then  starts  with  the  highest  link 
among  remaining  links,  and  so  forth.  This  order  of  generation  is  not  especially  significant  be¬ 
cause  it  is  possible  that  the  links  included  in  a  network  after  the  initial  link  do  not  have  co-oc¬ 
currence  strengths  in  the  same  high  range  as  this  initial  link. 

Figure  2  shows  Pass-1  links  for  a  second  1991-1994  network.  This  network,  named  General, 
was  the  ninth  one  generated  from  1991-1994  data. 


6-  These  were  originally  called  Leximappes  [Turner  88]. 
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|  Human  factors  H.1  .2  |-l|user  interfaces  DA 


2  ■ | Interaction  techniques  1.3.6 


User  interface  manage¬ 
ment  systems  (umis) 

/  \  r - 1 

f  \  1  Evaluation/methodology  H  5  2 

H.5.2 

/  Y 

- - C _ 

I  User/machine  systems  H.1. 2  L 


User  interfaces  H.5.2 


1991-1994  Map  1:  User  Interfaces 


Figure  1:  First  Example  of  a  Pass-1  Network 


j  Curriculum  K.3.2 


General  D.2.0 


General  D.3.0 


Figure  2:  Second  Example  of  a  Pass-1  Network 


3.2.2  Pass-2 

The  second  pass  (Pass-2)  is  designed  to  seek  further  associations  among  descriptors  found 
in  Pass-1 .  During  Pass-2,  networks  are  extended  by  the  addition  of  Pass-2  links.  To  be  a  can¬ 
didate  for  inclusion  in  Pass-2,  both  nodes  (descriptors)  of  a  Pass-2  link  must  be  in  some  Pass- 

1  networks.  A  Pass-2  link  connects  a  Pass-1  node  in  a  given  network  to  a  node  that  had  oc¬ 
curred  as  a  Pass-1  node  in  another  network  but  is  represented  in  the  given  network  as  a  Pass- 

2  node.7  Pass-2  nodes  and  Pass-2  links  are  represented  by  thin  boxes  and  by  thin  lines  con¬ 
necting  them  with  Pass-1  nodes,  respectively.  Pass-2  becomes  the  basis  for  determining  how 
networks  fit  together  in  larger  super  networks  (see  Chapter  6,  Super  Network  Analysis). 

As  in  Pass-1 ,  candidate  links  are  included  in  Pass-2  based  on  their  strengths  and  co-occur¬ 
rence  counts.  The  order  of  Pass-2  links  is  by  descending  values  for  qualifying  links.  A  node 
can  appear  in  only  one  Pass-1  network,  but  can  appear  in  more  than  one  Pass-2  link. 

Figure  3  illustrates  this  process  for  Pass-2  of  the  network  in  Figure  1 .  Recall  that  Pass-2  nodes 
must  always  appear  previously  as  Pass-1  nodes  in  other  networks.  In  Figure  2,  Curriculum 
(K.3.2)  forms  a  Pass-2  connection  with  the  Pass-1  node  Computer  Science  Education  (K.3.2) 
via  link  1 1  in  Figure  3. 


Sometimes  two  Pass-1  nodes  in  a  network  are  joined  during  Pass-2;  such  links  are  considered  Pass-1  links 
because  they  join  two  Pass-1  nodes. 
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Table  3  shows  data  for  Pass-1  and  Pass-2  links  in  Figure  3.  The  Pass-1  networks  of  all  nodes 
incorporated  during  Pass-2  are  given  in  the  last  column  (it  is  1  for  all  Pass-1  links).  Two  nodes 
from  Figure  2  (Map  9  of  1991-1994)  are  in  Pass-2  links  of  the  Figure  1  network.  Other  links 
come  from  the  various  Pass-1  networks  for  the  data. 


CM  U/SEI-95-TR-01 9 
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Table  3:  Links  in  Decreasing  Order  of  Strength 


Order 

Node  1 

Node  2 

Co- 

Occurrence 

Strength 

(S) 

Pass-1 

Map 

Pass  -1 

IK 

User  interfaces  D.2.2 

User  interfaces  H.5.2 

177 

0.181802 

i 

■ 

User  interfaces  H.5.2 

User/machine  systems 
H.1.2 

56 

0.062695 

1 

■ 

User  interface  manage¬ 
ment  systems  (uims) 
H.5.2 

User  interfaces  D.2.2 

47 

0.057496 

1 

4 

User  interfaces  D.2.2 

User/machine  systems 
H.1.2 

69 

0.051381 

1 

5 

Interaction  techniques 
1.3.6 

User  interfaces  D.2.2 

32 

0.036248 

i 

10 

Computer  science  educa¬ 
tion  K.3.2 

Human  factors  H.  1 .2 

20 

0.029121 

1 

6 

Interaction  styles  H.5.2 

User  interfaces  D.2.2 

44 

0.025195 

1 

7 

Evaluation/methodology 

H.5.2 

User  interfaces  D.2.2 

16 

1 

8 

Screen  design  H.5.2 

User  interfaces  D.2.2 

16 

0.012246 

1 

9 

Human  factors  H.1.2 
- - - 

User  interfaces  D.2.2 

27 

0.009487 

1 

Pass-2 

11 

Computer  science  educa¬ 
tion  K.3.2 

Curriculum  K.3.2 

18 

0.080198 

9 

12 

Computer  science  educa¬ 
tion  K.3.2 

General  D.2.0 

38 

0.034534 

9 

13 

Ada  D.3.2 

Computer  science  edu¬ 
cation  K.3.2 

17 

0.009506 

7 

14 

Human  factors  H.1.2 

Software  development 
K.6.3 

24 

0.009167 

3 

15 

Design  D.2.10 

User/machine  systems 
H.1.2 

18 

0.007778 

8 

16 

Interaction  styles  H.5.2 

Windows  D.2.2 

29 

0.007569 

5 

17 

Object-oriented  program¬ 
ming  D.1.5 

User  interfaces  D.2.2 

55 

0.007446 

3 

18 

User  interfaces  H.5.2 

X- Windows  D.2.2 

19 

0.005405 

6 

19 

Tools  and  techniques 

D2.2 

User  interfaces  H.5.2 

34 

0.005361 

7 

20 

Management  D.2.9 

User  interfaces  D2.2 

29 

0.004623 

3 

21 

Management  D.2.9 

User/machine  systems 
H.1.2 

15 

0.004261 

3 
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3.2.3  Algorithm  Constraints 

aVlh™eHminiT  C°nS,raintS'  deSCriP'0rS  appearinS  gently  but  almost  always  to- 
g  could  dominate  networks;  hence  a  minimum  co-occurrence  c-  value  is  required  to 

generate  a  link.  At  the  same  time,  some  maps  can  become  cluttered  due'to  an  excesX  num- 

ofenodes9lt" rt?teJ'nkS  (bUt  °f  generally  decreasin3  s  values);  hence,  restrictions  on  numbers 
clots  H  and  llnks  are  sotnebmes  required  to  facilitate  the  discovery  of  major  paditions  of  con- 

Ibserve^r"  m8"y  are  limited  only  by  the  number  of  qualifying  nodes,  as  we  will 

For  the  time  periods  of  1987-1990  and  1991-1994, 15  co-occurrences  of  descriptors  were  re¬ 
quired  before  they  could  become  candidates  for  linking;  for  1 982- 1 986,  the  co-occurrence  cut¬ 
off  was  set  at  5  to  accommodate  the  lesser  volume  of  data.  For  all  time  periods  the  number  of 
knks  and  nodes  in  each  network,  both  Pass-1  and  Pass-2,  was  se,  a,  24  links  and  20  node” 
For  these  values,  the  co-word  algorithm  generated  15, 16,  and  11  networks,  respectively  fo^ 
the  periods  1982-1986, 1987-1990,  and  1991-1994.  Table  4  summarizes  these  values.  ’ 


Table  4:  Parameters  and  Resulting  Networks 


Time  Period 

Minimum  Co- 
Occurrence 

Maximum 

Nodes 

Maximum 

Links 

Networks 

Generated 

1982-1986 

5 

— - - - 

20 

24 

15 

1987-1990 

15 

20 

24 

16 

1991-1994 

15 

20 

24 

11 

3.2.4  Algorithm  Summary 

Following  is  a  summary  of  the  algorithm: 


1 

2 

3 

4 

5 

6 


e  ect  a  minimum  for  the  number  of  co-occurrences,  c,y,for  descriptors  /  and/ 

Select  maxima  for  the  number  of  Pass-1  links  and  nodes. 

Select  maxima  for  the  total  (Pass-1  and  Pass-2)  links  and  nodes. 

Start  Pass-1 . 

Generate  the  highest  S  value  from  all  possible  descriptors  to  begin  a  Pass-1  network 
due  to  .hr0m  ‘hat  'ink'  f°rm  0lher  “nkS 3  bread*b-first  manner  until  no  more  links  are  possible 
descriptors  from  the  list  ofZseqllrt  1^11^"^^™'.  Re'™'e  *"  inC°rp°ra,ed 

descriptors9 co-occur  freX^elCh^  until  no  two  remaining 

8  Begin  Pass-2. 

9  Restore  all  Pass-1  descriptors  to  the  list  of  available  descriptors 

-anyrsTir^ 
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value;  stop  when  no  remaining  descriptors  meet  co-occurrence  minima  or  when  total  node  or 
link  maxima  are  met.  Do  not  remove  any  descriptors  from  the  available  list. 

1 1  Repeat  Stepl  0  for  each  succeeding  Pass-1  network. 

A  maximum  number  of  Pass-1  networks  can  be  specified  in  cases  where  an  excessive  num¬ 
ber  of  networks  will  be  generated  otherwise;  this  restriction  was  not  necessary  here. 

Numerous  variations  of  this  algorithm  are  possible. 

3.3  Comments  on  Selection  of  Network  Parameters 

Link  and  node  limitations  mostly  determine  how  networks  will  be  generated  in  concert  with  the 
corresponding  co-occurrence  minimum.  If  the  co-occurrence  minimum  is  too  high,  few  links 
may  be  formed;  if  it  is  too  low,  an  excessive  number  of  links  may  result.  In  the  former  case, 
subspecialities  in  a  field  may  not  emerge;  in  the  latter  case,  a  field  may  look  disproportionately 
cluttered. 

The  parameters  for  1982-1986  were  chosen  somewhat  arbitrarily  because  of  the  small 
amount  of  data.  We  attempted  to  establish  a  baseline  for  comparison  with  following  genera¬ 
tions.  The  primary  point  of  contention  was  the  co-occurrence  value  of  5.  It  is  somewhat  higher 
in  proportion  to  the  number  of  documents  and  descriptors  than  the  value  of  15  for  succeeding 
generations.  We  feel  the  number  of  networks  and  super  networks  generated  supports  our 
choice. 

In  setting  co-occurrence  values  for  the  1 987-1 990  and  1991-1 994  generations,  the  proper  val¬ 
ues  could  be  determined  at  least  two  ways:  as  a  function  of  the  ratios  of  indexed  items  or  the 
ratio  of  the  number  of  descriptors.  We  used  the  former.  Because  the  numbers  of  items  for  the 
generations  were  almost  equal  (7,650  and  7,395),  we  set  the  co-occurrences  the  same.  How¬ 
ever,  the  numbers  of  descriptors  were  sufficiently  different  (28,471  and  23,61 1)  to  question  if 
the  co-occurrence  for  1991-1994  should  be  lower  than  for  1987-1990.  To  test  this  hypothesis, 
we  set  the  1991-1994  co-occurrence  at  13  and  recomputed. 

This  change  still  resulted  in  11  networks.  Some  networks  were  different,  but  only  on  the  fring¬ 
es.  The  central  themes  remained  the  same.  More  links  and  nodes  were  realized  with  the  lower 
co-occurrence  value  (1 6%  and  1 9%,  respectively),  as  would  be  expected.  Many  of  these  new 
links  and  nodes  were  formed  through  additional  connections  of  already  existing  nodes  in  the 
same  and  in  other  maps  existing  at  the  higher  co-occurrence  level.  Additionally,  1  isolated  net¬ 
work  with  only  2  nodes  was  absorbed  by  a  larger  network  at  the  13  co-occurrence  level,  while 
a  new,  isolated  network  with  3  nodes  and  2  links  emerged. 

So,  while  the  link,  node,  and  co-occurrence  parameters  effectively  control  the  generation  of 
networks,  small  changes  in  their  values  appear  to  affect  only  marginal  links,  at  least  in  this 
study.  Of  course,  additional  and  subsequent  data  can  affect  the  generation  of  core  themes 
without  changes  in  parameters,  which  is  the  intent  of  co-word  analysis. 
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4  Network  Analysis 


commentary  trends  in  software  engineering  publications. 

maos  bm  LorUT?  aZSB\ ‘he  aPPendiX'  A  Wea"h  of  information  emerges  from  these 


4.1  Network  Names 


4.1.1  Methodology 
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4.1.2  Findings 

The  names  chosen  are  as  follows: 

4.1 .2.1  1 982-1 986  Networks 

1 .  Software  Management  -  Ada 

Management 

2.  Logic  Programming 

3.  User  Interfaces 

Human  factors,  software  psychology 

4.  Standards 

5.  Tools  and  Techniques  -  Structured  Programming  -  Pascal 

6.  Software  Development 

Programming  environments 

7.  Software  Libraries 

8.  Testing  and  Debugging  -  Correctness  Proofs 

Software  quality  assurance,  concurrent  programming 

9.  Reliability 

10.  Program  Editors 
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11.  Requirements/Specifications  -  Systems  analysis  and  design 

12.  Modules  and  Interfaces 

1 3.  Real-Time  Systems 

14.  Abstract  Data  Types 

15.  Metrics 
Life  cycle 

4.1 .2.2  1 987-1 990  Networks 

1 .  Geometrical  Problems  and  Computations 

2.  Correctness  Proofs  -  Languages 

Semantics,  real-time  and  embedded  systems 

3.  Logic  Programming 

4.  Requirements/Specifications  -  Methodologies 

Program  verification,  abstract  data  types 

5.  User/Machine  Systems 

User  interfaces,  human  factors 

6.  Methodologies  -  Software  Development 

Computer  science  education 

7.  Standards 

8.  Structured  Programming 

9.  Applications  and  Expert  Systems  -  Tools  and  Techniques 

Interactive 

10.  Concurrent  Programming  -  Ada 
Compilers 

1 1 .  Computer-Aided  Design 

12.  Error  Handling  and  Recovery 

13.  Distribution  and  Maintenance 

14.  Software  Configuration  Management 

15.  Reusable  Software 

1 6.  Software  Management  -  Design 

4.1. 2.3  1991-1994  Networks 

1 .  User  Interfaces 

Computer  science  education 

2.  Petri  Nets 

3.  Software  Development  -  Object-Oriented  Programming 
18 
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4.  Software  Libraries  -  C++  -  Microsoft  Windows 

Object-oriented  programming,  C 

5.  Windows 

6.  X-Windows 

7.  Tools  and  Techniques  -  CASE  -  Systems  Analysis  and  Design 

Ada,  object-oriented  programming,  programming  environments 

8.  Requirements/Specifications 

Testing  and  debugging,  program  verification 

9.  General 

Computer  science  education 

10.  Concurrent  programming 

1 1 .  Metrics 

Perusing  the  maps  of  networks  in  the  appendix  reveals  several  variations  in  structure.  Some 
maps  have  few  nodes,  some  maps  have  many  nodes,  and  some  are  dominated  by  connec¬ 
tions  from  one  or  two  nodes.  Others  have  distributed  connections;  while  still  others  are  not  re¬ 
ally  one  map,  but  two  (or  three)  maps.  We  will  describe  these  variations  more  fully  in  the 
following  section. 

For  reference  in  the  following  sections,  the  primary  network  names  for  each  time  period  are 
given  in  Table  5.  Note  that  networks  are  numbered  sequentially  in  the  order  generated  by  co¬ 
word  analysis  algorithms;  hence,  the  same  numbers  do  not  imply  the  same  network  names 
across  time  periods. 
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Table  5:  Network  Names  and  Numbers 


1982-1986 

1987-1990 

1991-1994 

1 

Software  Management  -  Ada 

Geometrical  Problems  and 
Computations 

User  Interfaces 

2 

Logic  Programming 

Correctness  Proofs  -  Lan¬ 
guages 

Petri  Nets 

User  Interfaces 

Logic  Programming 

Software  Development  - 
Object-Oriented  Program¬ 
ming 

■ 

Standards  c 

Requirements/Specifications 
-  Methodologies 

Software  Libraries  -  C++  - 
Microsoft  Windows 

5 

Tools  and  Techniques  - 
Structured  Programming  - 
Pascal 

User/Machine  Systems 

Windows 

6 

Software  Development 

Methodologies  -  Software 
Development 

X-Windows 

7 

Software  Libraries 

Standards 

Tools  and  Techniques  - 
CASE  -  Systems  Analysis 
and  Design 

8 

Testing  and  Debugging  - 
Correctness  Proofs 

Structured  Programming 

Requirements/Specifications 

9 

Reliability 

Applications  and  Expert  Sys¬ 
tems  -  Tools  and  Techniques 

General 

10 

Program  Editors 

Concurrent  Programming  - 
Ada 

Concurrent  Programming 

11 

Requirements/Specifications 
-  Systems  Analysis  and 

Design 

Computer-Aided  Design 

Metrics 

12 

Modules  and  Interfaces 

Error  Handling  and  Recovery 

13 

Real-Time  Systems 

Distribution  and  Mainte¬ 
nance 

14 

Abstract  Data  Types 

Software  Configuration 
Management 

15 

Metrics 

Reusable  Software 

16 

Software  Management  - 
Design 

20 
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4.2  Network  Summaries 

4.2.1  Methodology 
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1/2.  We  observe  that  the  ratios  of  links  to  nodes  oencralh  '  h  m,r"mum  value  lor  i/«  is 

;=s ,  bu,  remains  less  than  2  even  tor  the  larger 
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number  of  links.  2 tyWN.  „,  nte  arZTl  J  1 "  "  "9,W0rk '° ite  maxim™  Possible 
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works  in  a  time  period.  These  measures  are  imnnTVh'  'ndlvldual  ne,works  and  for  all  net- 
degree  of  interactions  of  documents  and  ™Ch  ab°Ul  ,he 

4.2.2  Findings 

The  results  of  this  analysis  are  presented  in  Tables  6,  7,  and  8. 

15  networks,  from  a  total  ofT$46  dOTume^W)^6  d°CUmen,s  were  covered  by  the 
curnng  descriptors  that  appeared  in  at  least  4  other”*’  h  S  98  documenls  led  co-oc- 
Now  consider  Map  ,  in  Table  8-  „  T  ™en'S  **  MS  ,ime  period' 

Notice  that  the  networks  with  highe^UN  mLTIZl^T !n,S '19%  01  ,he  698  Wal  used). 
Also  note  that  the  column  “Percentage  of  Documents"  tom?.  T  °CUmer"  and  node  values. 
document  can  be  included  in  the  construction  o,  more  than  onemT  10°%  h"39"  3 
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Table  6:  1982-1986  Network  Summary  Data 

Total  unique  documents  included:  698 

Total  documents  available:  1646  _ 

Percentage  of  documents  used:42%  _ _ 


Map 

Nodes  N 

Links  L 

L/N 

Percentage  of 
Connectivity 

Unique 

Documents 

Percentage  of 
Documents3 

1 

18 

24 

1.33 

16% 

136 

19% 

2 

2 

1 

0.50 

100% 

6 

1% 

3 

17 

24 

1.41 

18% 

197 

28% 

4 

2 

1 

0.50 

100% 

5 

1% 

5 

20 

23 

1.15 

12% 

108 

15% 

6 

20 

22 

1.10 

12% 

173 

25% 

7 

3 

2 

0.67 

67% 

11 

2% 

8 

20 

23 

1.15 

12% 

129 

18% 

9 

6 

6 

1.00 

40% 

16 

2% 

10 

3 

2 

0.67 

67% 

13 

2% 

11 

17 

24 

1.41 

18% 

117 

17% 

12 

2 

1 

0.50 

100% 

5 

1% 

13 

3 

2 

0.67 

67% 

12 

2% 

14 

3 

2 

0.67 

67% 

9 

i% 

15 

12 

13 

1.08 

20% 

55 

8% 

Totals 

992 

142%a 

Can  exceed  100%  because  a  document  can  be  included  in  more  than  one  network. 
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Table  7:  1987-1990  Network  Summary  Data 


total  unique  documents  included:  3062 

total  documents  available:7650 

Percentage 

of  document 

s  used:40% 

Map 

Nodes  N 

Links  L 

L/N 

Percentage  of 
Connectivity 

Unique 

Documents 

Percentage  of 
Documents3 

i 

6 

6 

1.00 

40% 

50 

2% 

2 

17 

22 

1.29 

16% 

251 

8% 

3 

3 

2 

0.67 

67% 

27 

1% 

4 

16 

24 

1.50 

20% 

485 

16% 

5 

15 

24 

1.60 

23% 

847 

28% 

6 

20 

23 

1.15 

12% 

732 

24% 

7 

3 

2 

0.67 

67% 

29 

1% 

8 

4 

3 

0.75 

50% 

67 

2% 

9 

19 

24 

1.26 

14% 

666 

22% 

10 

18 

24 

1.33 

16% 

396 

13% 

11 

4 

3 

0.75 

50% 

52 

2% 

12 

2 

1 

0.50 

100% 

17 

1% 

13 

4 

3 

0.75 

50% 

52 

2% 

14 

6 

5 

0.83 

33% 

75 

2% 

15 

16 

24 

1.50 

20% 

422 

14% 

16 

11 

20 

1.82 

36% 

324 

n% 

Totals 

4492 

147%a 

a.  Can  exceed  100%  because  a  document 


can  be  included  in  more  than  one  network. 


Table  8: 1991-1994  Network  Summary  Data 


Total  unique  documents  included:2881 

Total  documents  available:7395 

%  documents  used:38% 

Map 

Nodes  N 

Links  L 

L/N 

Percentage  of 
Connectivity 

Unique 

Documents 

Percentage  of 
Documents2 

1 

20 

21 

1.05 

11% 

565 

20% 

2 

2 

1 

0.50 

100% 

27 

1% 

3 

20 

23 

1.15 

12% 

861 

31% 

4 

17 

24 

1.41 

18% 

492 

18% 

5 

17 

17 

1.00 

13% 

401 

14% 

6 

8 

7 

0.88 

25% 

125 

4% 

7 

17 

24 

1.41 

18% 

643 

23% 

8 

16 

23 

1.44 

19% 

487 

17% 

9 

5 

5 

1.00 

50% 

95 

3% 

10 

4 

3 

0.75 

50% 

37 

1% 

11 

5 

5 

1.00 

50% 

83 

3% 

Totals 

3816 

136%a 

a.  Can  exceed  100%  because  a  document  can  be  included  in  more  than  one  network. 


These  data  show  the  variation  in  network  structures  within  a  time  period.  Some  networks  are 
minimal;  they  have  only  two  nodes.  Examples  are  Network-2,  -12,  and  -2  (Logic  Programming, 
Error  Handling  and  Recovery,  and  Petri  Nets)  from  1982-1986,  1987-1990,  and  1991-1994, 
respectively.  Some  other  networks  approach  minimal  structure. 

Other  networks  are  more  fully  formed.  Some  embody  the  maximum  allowable  number  of  links, 
nodes,  or  both.  See  1991-1994  Network-1, -3,  -4,  -7,  and  -8  (User  Interfaces,  Software  Devel¬ 
opment  -  Object  Oriented  Programming,  Software  Libraries  -  C++  -  Microsoft  Windows,  Tools 

and  Techniques  -  CASE  -  Systems  Analysis  and  Design,  Requirements/Specification)  for  ex¬ 
amples. 
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5  Types  of  Networks  and  Their  Interactions 

5.1  Methodology 

There  are  essentially  three  types  of  networks:  principal,  secondary,  and  isolated.  Principal  net¬ 
works  are  connected  to  one  or  more  (secondary)  networks.  Secondary  networks  generally  are 
linked  to  principal  networks  through  a  relatively  high  number  of  external  links  in  the  principal 
networks.  Isolated  networks  have  an  absence  (or  low  intensity)  of  links  with  other  networks. 

Isolated  networks  often  have  links  with  high  S  values,  usually  accompanied  by  low  co-occur¬ 
rence  Cjj  values.  While  isolated  networks  are  easy  to  recognize,  principal  and  secondary  net¬ 
works  may  not  be.  Therefore,  we  will  define  and  operationalize  terms  that  characterize  these 
functionalities. 

We  defined  density  as  the  mean  of  the  Pass-1  s  values  of  a  network;  centrality  is  defined  as 
the  square  root  of  the  sum  of  the  squares  of  the  Pass-2  S  values  of  a  network  in  order  to  dis¬ 
tinguish  among  relatively  close  values.  Density  represents  the  internal  strength  of  a  network, 
while  centrality  represents  a  network's  position  in  strength  of  interaction  with  other  networks.® 


5.2  Findings 

Plots  of  centrality  and  density  for  each  of  the  time  periods  are  shown  in  Figures  4,  5,  and  6.9 
The  origin  of  these  figures  is  the  median  of  the  respective  axis  values  (the  horizontal  axis  rep¬ 
resents  centrality;  the  vertical  axis  represents  density).  Not  surprisingly,  most  networks  with 
strong  centrality  scores  also  show  relatively  high  unique  document  counts  and  L/N  ratios  as 
indicated  in  Table  5  for  1991-1994  data. 

Isolated  networks  show  relatively  low  document  counts  and  L/N  ratios,  (see  1991-1994  Net¬ 
work-2,  Petri  Nets). 


These  terms  are  accepted  ones  in  co-word  analysis  literature.  We  recognize  that  density  and  centralitv  have 
others  domain-specific  connotations-say,  in  statistics.  Alternative  choices  include  density  and  centrality  but 
k  nnt  * ready. hav®  meanin9s  'n  software  engineering  literature.  Adhesion  and  density  could  be  used  but’ that 

intent  oJrSfnrgrrdear3*^6’  *  C°n,°UndS  USe  °f  the  term  density  eve"  more'  We’tmrt  the 


Figures  4,  5,  and  6  are  not  to  precise  scale;  relative  positions  are  represented. 
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©  © 


Centrality 


^Medians:(0.267, 0.035)  ^ 


Network  1  -  User  Interfaces 
Network  2  -  Petri  Nets 

Network  3  -  Software  Development  -  Object-Oriented  Proarammina 

KfiW°rv  i '  ^?ft^are  Libraries  '  G++  -  Microsoft  Windows9  9 
Network  5  -  Windows 

Network  6  -  X-Windows 

sass  s :  • Sys,ems  Anaivs,s  and  1 ^ 

Network  9  -  General 

Network  10  -  Concurrent  Programminq 

Network  1 1  -  Metrics 


Figure  6:  1991-1994  Centrality  and  Density 

map.!!e ,T in,ereS“n9  netWOrkS are  ,he °nes  Wi,h  b0,h  s,ron9  dens"y  a"d strong cen- 
FeW  °f  'hbSe  bmfrse' whioh  testi,ies  to  software  engineering's  somewhat  indefinite  fo¬ 
ri  h  ^  arS  'denM,ed  ln  1987'1990-  However,  in  the  1991-1994  data,  Networks-1  -3  and 
-4  have  these  propedies.  These  networks  also  have  strong  interaction  with  each  other  Net¬ 
work-7  shows  strong  centrality.  Network-2  shows  strong  density  but  weak  (actually  zero)  cen- 

Network  10  andhf  T  kit  8  T**  ^  ~  ****  "a.c£s  »n. 

ilr  Zt  „  Z 1  are  b®l°«  the  median  for  both  centrality  and  density  scores.  Sim- 

liar  analyses  can  be  performed  on  the  other  periods. 


5.3  Evidence  of  a  Coalescing  Field 

in  ml8!!"8  7 that  f0flWare  engineerin9  is  f|nd'n9  more  general  definition  in  1991-1994  than 
the  other  two  earlier  time  periods.  We  can  see  this  by  looking  at  data  for  numbers  of  net- 
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works,  for  centrality,  and  for  density  in  Table  9. 


Table  9:  Comparison  of  Properties  for  Time  Periods 


Property 

1982-1986 

1987-1990 

1991-1994 

Number  of 
Networks 

15 

16 

11 

Median 

Centrality 

.2176 

.2176 

.2664 

Median 

Density 

.0507 

.0458 

.0350 

This  comparison  is  especially  striking  for  the  periods  1987-1990  and  1991-1994.  We  observe 
that  the  number  of  networks  declined,  the  centrality  measure  increased,  and  the  density  mea¬ 
sure  decreased.  This  indicates  more  integration  of  subtopics  and  fewer  isolated  networks,  as 
would  be  expected  in  a  more  focused  discipline.  Future  data  will  be  needed  to  evaluate  this 
possible  trend. 
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6  Super  Network  Analysis 


6.1  Methodology 

In  addition  to  describing  how  networks  compare  within  a  period,  we  can  be  more  specific  in 
describing  how  networks  interact  with  other  specific  networks;  this  addresses  centrality  in  a 
more  focused  fashion,  but  does  not  substitute  for  the  general  centrality  measure. 

We  chose  to  operationalize  principal  and  secondary  networks  as  follows:  If  Network-A  has  in¬ 
ternal  nodes  that  are  Pass-2  nodes  in  x  links  of  Network-B,  and  each  of  these  links  has  a 
Pass-2  S  value  that  exceeds  the  minimum  Pass-1  S  value  of  Network-B,  then  Network-A  is 
a  secondary  network  of  Network-B. 

Using  this  way  of  determining  principal  and  secondary  networks,  we  can  describe  super  net¬ 
works  of  networks.  The  relationships  in  these  super  networks  are  not  inherently  bi-directional 
as  are  network  links  (at  least  as  defined  using  S). 

6.2  Findings 

Tables  1 0, 1 1 ,  and  1 2  give  all  networks  that  have  at  least  one  qualifying  connection  with  other 
networks.  Shown  with  each  network  is  an  entry  in  the  form  y(z) ;  y  indicates  the  associated 
network  and  z  shows  the  number  of  qualifying  links.  From  this,  we  can  then  construct  a  super 
network  at  whatever  threshold  of  x  we  choose. 

Setting  the  threshold  at  x  =  2  qualifying  connections,  we  can  construct  a  super  network  of  net¬ 
works  for  each  period  as  shown  in  Figures  7,  8,  and  9.  By  selecting  higher  or  lower  values  for 
the  threshold  (either  in  terms  of  the  number  of  qualifying  links  or  the  level  of  qualification),  we 
can  derive  other  super  networks. 

Consider  the  1991-1994  super  network  (Figure  9)  and  its  underlying  generating  data  (Table 
12).  The  names  and  other  prominent  descriptors  of  1 991  -1 994  networks  are  included  in  Figure 
9  for  convenience  because  they  are  used  in  the  following  discussion. 

Some  observations  include  the  following: 

•  Network-2,  -5,  -6,  -10,  and  -11  are  isolated  networks. 

•  Network-3  is  a  secondary  network  of  principal  network  Network-8- 
Network-7  is  a  secondary  network  of  Network-8. 

•  Network-3  is  a  principal  network  and  a  secondary  network  relative  to  both 
Network-4  and  Network-7. 

•  Network-7  is  especially  strongly  connected  to  Network-3;  Network-3  is 
less  strongly  connected  to  Network-7  (at  least  relative  to  the  former). 
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Putting  this  in  context  of  the  networks'  contents,  we  might  conclude  the  following: 

•  Object-oriented  programming  is  a  major  focus  of  software  development. 

•  Software  libraries  have  combined  with  object-oriented  methodoloqies  as 
principal  development  activities. 

•  The  major  systems  used  now  in  software  engineering  are  Ada  C++  C 

UNIX,  X-windows,  and  Microsoft  Windows.  ’  ’  ’ 

•  Computer-aided  software  engineering  and  object-oriented  languages  are 
©merging  as  specific  tools  in  software  development. 

Looking  further  at  the  isolated  networks  and  the  centrality/density  diagram,  we  might  conclude 
that  Petri  Nets  is  either  an  emerging  or  dying  research  topic  because  it  is  completely  isolated 
from  other  networks.  Many  other  conclusions  and  impressions  are  derivable  from  the  networks 
and  super  networks.  Interested  readers  can  make  additional  analyses  with  the  information 


Table  10:  Possible  1982-1986  Super  Networks 


Possible  1982-1986  Super  Networks 

Network 

Connected  Networks 

[network  number(number  of  links)] 

1 

3(1),  6(2),  11(2),  15(1) 

2 

none 

3 

1(1),  15(2) 

4 

none 

5 

6(2) 

6 

1(2),  5(2),  7(1),  8(1),  10(1),  11(2),  15(1) 

7 

none 

8 

111(1).  14(1) 

9 

none 

10 

none 

11 

1(2),  6(2),  8(2) 

12 

none 

13 

none 

14 

8(1) 

15 

1(1),  3(2),  8(1) 

30 


CMU/SEI-95-TR-01 9 


Table  11:  Possible  1991-1994  Super  Networks 


Possible  1991-1994  Super  Networks 

Network 

Connected  Networks 

[network  number(number  of  links)] 

1 

7(1),  9(2) 

2 

none 

3 

4(2),  7(5) 

4 

3(3),  6(1) 

5 

none 

6 

4(1) 

7 

1(1),  3(13) 

8 

1(1),  3(5),  7(3) 

9 

1(2) 

10 

none 

11 


none 


Table  12:  Possible  1987-1990  Super  Networks 


Possible  1987-1990  Super  Networks 

Network 

Connected  Networks 

[network  number(number  of  links)] 

1 

none 

2 

4(3) 

3 

none 

4 

2(4),  6(5),  15(2),  16(1) 

5 

9(4) 

6 

4(2),  5(2),  9(5),  14(1),  16(1) 

7 

none 

8 

none 

9 

5(7),  6(5),  10(1),  15(1) 

10 

2(1),  6(1),  9(1),  15(2) 

11 

none 

12 

none 

13 

none 

14 

none 

15 

4(1),  6(2),  9(3),  10(3) 

16 

6(1) 

CMU/SEI-95-TR-019 


©  © 


O  0 


i  Mini  i  III  i  inn  i  HIM  II  i  nil  III  mil  mm . nnnnnnnnninnnnnnnmnnnn 

Network  1  -  User  Interfaces 

Computer  Science  Education 
Network  2  -  Petri  Nets 

Network  3  -  Software  Development  -  Object-Oriented  Programming 
Object-Oriented  Programming,  C 
Network  4  -  Software  Libraries  -  C++  -  Microsoft  Windows 
Network  5  -  Windows 
Network  6  -  X- Windows 

Network  7  -  Tools  and  Techniques  -  CASE  -  Systems  Analysis  and  Design 
Ada,  Object-Oriented  Programming,  Programming  Environments 
Network  8  -  Requirements/Specifications 

Testing  and  Debugging,  Program  Verification 
Network  9  -  General 

Computer  Science  Education 
Network  10  -  Concurrent  Programming 
Network  1 1  -  Metrics 

- - - J 


Figure  9:  1991-1994  Super  Network,  x  =2 
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7  Trends  over  Periods 


By  examining  the  super  networks  and  their  component  networks  over  the  different  time  peri- 
o  s,  we  can  observe  aspects  of  the  evolution  of  software  engineering.  First  we  consider  spe¬ 
cific  contexts  of  some  descriptors  in  different  time  periods;  then  we  illustrate  a  way  to  trace  the 
transformation  of  gonoral  network  themes  over  time. 

This  general  methodology  is  applicable  in  other  similar  applications.  We  demonstrate  this 
technique  for  CCS  findings. 


7.1  Analysis  of  Descriptor  Contexts 

Through  the  use  of  network  names,  we  observed  that  the  foci  of  study  in  each  period  were 
software  development  (which  includes  management),  user  interfaces,  parallelism,  verification 
and  validation,  requirements/specifications,  and  tools  and  techniques.  However  while  these 
foci  maintain  some  of  the  same  connections  over  different  time  periods,  they  also  evolve  bv 
forming  new  connections  to  different  nodes.  For  example,  the  1991-1994  Network-7  (Tools 
and  Techniques)  appears  with  CASE,  objected-oriented  techniques,  reuse,  and  Ada;  whereas 

in  the  related 1 982-1 986  Network  5,  Tools  and  Techniques  appears  with  Pascal  and  structured 
programming  topics. 

Much  of  the  change  can  be  gleaned  from  detailed  examinations  of  networks.  To  illustrate  this 
process,  we  will  present  two  detailed  cases. 

First,  we  will  look  at  some  smaller  portions  of  pertinent  networks.  In  1 982-86,  Ada  appears  as 
four  nodes  in  Network-1  (Software  Management  -  Ada)10  but  in  a  rather  isolated  fashion  (Fig- 
ure  10)  Later  it  becomes  an  integral  part  of  1 987-1 990  Network-1 5  (Reusable  Software)  (Fig- 

DesinnWF  1991-J994  Netw°rk-7  (Toofs  and  Techniques  -  CASE  -  Systems  Analysis  and 
g  )  (  igure  12),  in  the  middle  time  period,  it  associates  with  high-level  concepts  such  as 

warn "inthpT n  "h™68’  m0dule  interfaCeS’  concurrency>  and  object-oriented  soft¬ 

ware.  In  the  latter  period,  it  associates  with  military,  which  demonstrates  Ada’s  special  impor- 

tance  in  that  arena  of  software  development,  and  with  computer  science  education  which 

importance  in  the  research  community:  its  a~n  ~ 


10. 


11. 


As  an  implicit  subject  descriptor,  Ada  can  appear  in  any  appropriate  CCS  cateoorv 
from  manv  annrnarhfae  at  tha  ti 


from  many  approaches  at  the  time. 

Ada  appears  in  several  networks  during  each  time  period;  these  networks 


It  was  the  object  of  study 
and  contexts  are  selected  as  exam- 
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Ada  D.3.3 


Ada  D.2.6 


Software  quality  assurance  (sqa)  D.2.9 
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As  high-level  software  issues  become  more  integrated,  older  issues  fade.  Pascal  Basic  and 
Cobol  appear  in  the  1982-1986  Network-5  (Tools  and  Techniques  -  Structured  Programming 
-  Pascal;  see  Figure  13).  This  network  is  based  on  programming-in-the-small  issues,  such  as 
structured  programming  and  top-down  programming.  In  the  1987-1990  Network-8  (Structured 
Programming),  Basic  and  Cobol  appear  almost  in  isolation  with  structured  programming  (see 
Figure  14).  That  theme  then  disappears  in  1991-1994  as  software  engineering  research 
moves  to  programming-in-the-large  concerns. 


Figure  12:  Ada  in  Network-7, 1991-1994 
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Figure  13:  Structured  Programming  in  Network-5, 1982-1986 


Figure  14:  Structured  Programming  in  Network-8, 1987-1990 


We  can  summarize  some  other  observations;  the  reader  may  reference  the  correspondinq 
maps  in  the  appendix. 
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The  topic  of  standards,  which  is  important  in  any  well  defined  engineering  field,  appears  in  iso¬ 
lation  in  1982-1986  and  1987-1990  (Network-4  and  -7,  respectively,  both  named  Standards), 
then  goes  away  in  1991-1994.  This  indicates  that  standards  have  not  been  integrated  into  oth¬ 
er  important  software  engineering  discussions  in  any  of  the  time  periods  and  even  cease  to 

be  discussed  with  any  regularity  in  the  most  recent  time  period,  though  not  all  the  most  recent 
data  have  been  analyzed. 


Petri  nets  and  unbound  action  devices  appear  in  isolation  in  Network-2, 1991-1994.  We  can¬ 
not  tell  if  they  will  be  part  of  a  larger  network  yet.  Modules  and  interfaces  appear  in  isolation 
in  Network-12, 1982-1986;  then  appear  more  interrelated  with  other  descriptors,  e.g.,  with  Ada 
ini  987-1 990  Network-10  (Concurrent  Programming  -  Ada))  and  with  reusable  software  in 
1987-1990  Network-15  (Software  Management  -  Design).  After  that,  modules  and  interfaces 
do  not  appear,  but  reusable  software  and  related  themes  are  dominating;  perhaps  the  topic  of 

modules  and  interfaces  has  been  subsumed  in  these  expanded  topics.  We  will  return  to  this 
in  a  later  chapter  of  this  report. 

The  networks  and  contexts  discussed  here  are  not  exhaustive.  Many  other  transformations  of 
themes  are  suggested  by  the  networks  and  their  maps. 


7.2  Analysis  of  Networks  Across  Time  Periods 

7.2.1  Methodology 

The  transformation  of  networks  and  their  intersections  with  other  networks  across  time  periods 
provides  insights  into  the  emergence  of  software  engineering  research  themes.  To  quantify 

this  analysis,  we  apply  the  similarity  index  ( SI )  approach,  which  is  patterned  after  Callon’s  dis¬ 
similarity  index.  [Callon  91]. 

SI  measures  the  intersection  of  the  descriptors  in  two  networks.  It  does  not  directly  include 
the  corresponding  links  in  networks;  however,  since  all  descriptors  in  a  network  are  at  least 
indirectly  linked,  this  metric  captures  some  portion  of  network  similarity. 

Consider  two  networks  N,  and  N,.  Let  w,  be  the  number  of  descriptors  in  N„  let  w-  be  the 

number  of  descriptors  in  Nt  .and  let  wt  be  the  number  of  descriptors  common  to  N  and  N 
Then.  '  i‘ 


Sl(  w,,  Wj,  W/j)  =  2 


WH  ) 

Wj  +  Wj) 


0<SI<1 . 


We  multiply  by  2  so 
identical  nodes. 


that  the  maximum  value  of  SI  is  1 ,  which  occurs  when  N,  and  Nj  have 


7.2.2  Findings 

We  can  apply  SI  to  examine  the  emergence  of  some  1991-1994  networks.  Especially  inter¬ 
esting  are  the  three  networks  showing  both  strong  centrality  and  density  values  (called  core 
networks  or  core  themes).  The  networks  are  Network-1 ,  User  Interfaces;  Network-3  Software 
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Development  -  Object-Oriented  Programming;  and  Network-4,  Software  Libraries  -  C++  -  Mi¬ 
crosoft  Windows. 


First,  consider  1991-1994  Network-1,  User  Interfaces.  It  has  reportable  SI  intersections  with 
four  1987-1990  networks,  as  shown  below.12 


/V, 

n2 

*i 

w2 

*12 

SI 

1991-1994  Network  1 

1987-1990  Networks 

Requirements  Specifications- 

1.  Network-5:  User/Machine 

Systems  Analysis  and  Design 

Systems 

14 

14 

6 

0.423 

2.  Network-6:  Methodologies  - 
Software  Development 

3.  Network-9:  Applications 

14 

20 

7 

0.412 

and  Expert  Systems  -  Tools 

and  Techniques 

14 

19 

5 

0.303 

4.  Network-1 6:  Software 

Management 

14 

11 

5 

0.400 

Hence,  the  1991-1994  theme  User  Interfaces  incorporates  descriptors  from  several  1987- 
1990  networks.  Its  emergence  history  is  complicated;  tracing  it  further  could  require  investiga- 

tion  of  four  1987-1990  networks  and  of  all  their  1982-1986  predecessor  networks. 

Similarly,  1991-1994  Network-3,  Software  Development  -  Object-Oriented  Programming,  dis¬ 
plays  a  multiply  engendered  network  history.  It  has  reportable  SI  values  with  seven  1987- 
1 990  networks. 


12  Only  CCS  descriptors  defined  in  both  pertinent  time  periods  are  included  in  SI  descriptor  counts.  To  ensure 
notable  intersection  between  N,  and  Nj ,  we  require  w,j  >  5  before  reporting  SI . 
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1991-1994  Network  3 
Software  Development 


1 987-1 990  Networks 

1 .  Network-4:  Requirements/ 

w, 

wi 

wn 

S/^ 

Specification  -  Methodologies  18 

2.  Network-6: 

16 

5 

0.294 

Software  Development 

3.  Network-9:  Applications 
and  Expert  Systems  -  Tools 

18 

20 

7 

0.369 

and  Techniques 

4.  Network-10:  Concurrent 

18 

19 

5 

0.270 

Programming  -  Ada 

5.  Network  14:  Software 

18 

18 

7 

0.389 

Configuration  Management 

6.  Network  15:  Reusable 

18 

6 

6 

0.500 

Software 

7 .  Network  1 6:  Software 

18 

16 

7 

0.417 

Management  -  Design 

18 

11 

8 

0.552 

^  J 

- -  ■->  wu.myuicuiun  management,  was  completely 

absorbed  by  thel  991 -1994  network  (i.e.,  all  descriptors  of  the  earlier  network  are  descriptors 
of  the  latter  network). 

Now,  consider  1 991  -1 994  Network-4  Software  Libraries  -  C++  -  Microsoft  Windows.  It  has  a 
reportable  SI  value,  0.375,  for  only  one  1987-1990  network,  Network-15,  Reusable  Software 
Tracing  this  latter  network  to  1982-1986  ones  shows  that  it  has  a  reportable  SI  value  0  34s’ 

?ooofTn!98f'1986  Network'6’  Software  Development.  This  1987-1990  network  also  absorbs 
1982-1986  Network-7,  Software  Libraries,  and  Network-12,  Modules  and  Interfaces;  but  each 
of  these  networks  has  fewer  than  five  descriptors,  so  criteria  for  SI  scores  are  not  met.  This 

history  suggests  a  relatively  well-defined  emergence  path  for  themes  dealing  with  software  re¬ 
use. 

SI  analysis  can  also  show  the  lack  of  a  traceable  past.  Consider  1991-1994  Network-6  X- 
Windows.  It  has  no  identifiable  1987-1990  predecessors.  Only  four  networks  from  that  eariier 
penod  share  even  one  descriptor  with  it  (in  all  cases  the  same  descriptor~(User  interfaces 
.2.2)).  Similarly,  1991-1994  Network-5,  Windows,  has  no  reportable  1987-1990  predeces¬ 
sors.  Only  two  1987-1990  networks  share  any  descriptors  with  it  (1987-1990  Network-10  and 
-15,  w^th  one  and  two  descriptors,  respectively).  Taken  together,  we  see  a  rapid  emergence 

of  windows-based  research.  Sometimes  research  foci  emerge  quickly,  as  expected  in  a  dy¬ 
namic  field.  1 
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7.2.3  Similarity  Index  Within  a  Time  Period 

SI  can  also  be  useful  within  a  time  period  to  assess  the  similarity  of  companion  networks. 
Consider  the  1991-1994  core  networks:  They  have  substantial  intersection  with  each  other, 
as  seen  below: 


^  A/, 

n2 

w2 

w12 

SI 

1991-1994  Network  1 

1 991  -1 994  Network  3 

User  Interfaces 

Software  Development 

20 

20 

7 

0.350 

Object-Oriented  Programming 

1991-1994  Network  1 

1 991  -1 994  Network  4 

User  Interfaces 

Software  Libraries  -C++  - 

20 

17 

5 

0.270 

Microsoft  Windows 

1991-1 994  Network  3 

1 991  -1 994  Network  4 

!  Software  Development 

Software  Libraries  -  C++  - 

Object-Oriented  Programming  Microsoft  Windows 

20 

17 

6 

0.324 1 

V _ 

y 

The  network  predecessors  of  1991-1994  core  themes  demonstrate  notable  characteristics.  All 
of  them  with  reportable  SI  scores  also  have  high  centrality  scores  (Figures  4,  5,  and  6)  for  the 
time  periods  of  interest,  except  for  1987-1990  Network-14,  which  had  a  slightly  below-median 
score.  However,  that  network  was  completely  absorbed  by  its  successor.  Similarly,  two  1982- 
1 986  networks  with  below-median  centrality  scores  were  completely  absorbed  by  their  suc¬ 
cessor,  even  though  their  SI  scores  were  not  reportable.  In  these  latter  cases,  the  networks 
were  all  small  and  relatively  isolated. 

This  observation  suggests  that  core  themes  may  normally  emerge  from  predecessor  networks 
that  already  display  relatively  strong  connections  to  other  networks  within  the  same  time  peri¬ 
od.  It  also  suggests  that  isolated  networks  may  quickly  become  part  of  more  integrated  net¬ 
works  in  a  succeeding  time  period.  This  absorption  could  occur  because  one  new  link 
connects  a  small,  isolated  network  to  a  larger  network.  However,  certainly  not  all  isolated  net¬ 
works  merge  with  larger  ones,  as  is  so  far  evident  of  the  Standards  networks  of  1982-1986 
and  1987-1990.  As  noted  above  in  the  case  of  the  Structured  Programming  theme,  a  network 
also  can  transform  from  a  core  theme  (1982-1986,  Network-5)  to  an  isolated  theme  (1987- 
1990,  Network-8). 
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8  Descriptor  Analysis 


Direct  analysis  of  co-word  generated  descriptor  nodes  gives  a  supporting  view  of  which  de¬ 
scriptors  in  CCS-but  outside  of  software  engineering-interact  with  software  engineering  de¬ 
scriptors. 

8.1  Analysis 

Recall  that  only  descriptors  that  co-occur  with  other  descriptors  a  requisite  number  of  times 
and  with  relatively  high  strength  are  candidates  for  inclusion  in  networks.  Many  descriptors 
that  appear  in  documents  do  not  associate  often  enough  or  strongly  enough  with  other  de¬ 
scriptors  to  be  considered  for  inclusion.  The  strengths  of  associations  relative  to  other  associ¬ 
ations  further  limit  which  links  enter  into  a  network.  Of  the  1 ,606  unique  descriptors  appearing 
in  all  documents,  158  (9.8%)  descriptors  satisfied  these  criteria  and  appeared  in  the  generated 
networks.  We  cannot  define  the  maximum  possible  number  of  nodes  because  of  unrestricted 
numbers  of  implicit  subject  descriptors. 

Table  13  summarizes  the  most  frequently  appearing  descriptors  in  each  time  period.  The  table 
was  generated  by  first  obtaining  the  1 5  most  frequently  appearing  descriptors  within  each  time 
period  and  then  eliminating  redundancy  from  the  combined  lists.  Table  13  lists  descriptors  al¬ 
phabetically.  For  each  descriptor,  its  rank  in  each  period  is  shown  by  the  number  of  documents 
in  which  it  appears,  the  number  of  networks  in  which  it  appears,  and  the  number  of  times  it 
appears  (a  descriptor  can  be  connected  to  more  than  one  other  descriptor  in  the  same  net¬ 
work,  as  evident  in  Figure  3). 
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Table  13:  Summary  of  Descriptor  Data 


Rank  Order  of  Descriptor  Statistics  by  Generation 

Descriptor 

Rank  in  #  Documents 

Rank  in  #  Networks 

Rank  in  #  times  in  Net¬ 
work 

82-86 

87-90 

91-94 

82-86 

87-90 

91-94 

82-86 

87-90 

91-94 

Ada  D.3.2 

8 

13 

15 

ID 

D 

6t 

7 

Applications  and  expert  sys  1.2.1 

- 

15 

- 

Hi 

5 

- 

Computer  aided...  (CASE)  D.2.2 

# 

ID 

# 

# 

8t 

Correctness  proofs  D.2.4 

W 

- 

j|B| 

- 

lot 

- 

- 

Design  D.2.1 

# 

BH 

# 

||P| 

# 

DD 

Genera]  D.2.0 

3 

7 

7 

ID 

Human  factors  H.  1.2 

6 

8 

- 

it 

mi 

Interactive  D.2.6 

- 

12 

- 

- 

ID 

mi 

mil 

ID 

Management  D.2.9 

9t 

- 

13 

ID 

Methodologies  D.2.10 

# 

3 

1 

1 

# 

1 

■S 

Metrics  D.2.8 

15 

- 

mu 

- 

8t 

- 

Object-oriented  programming  D.1.5 

# 

# 

2 

# 

# 

2 

# 

# 

It 

Program  verification  D.2.4 

13 

| 

- 

- 

- 

Programming  environments  D.2.6 

2 

4 

6 

4 

D 

Requirements/specifications  D.2.1 

12 

6 

8 

'WM 

Reusable  software  D.2.m 

- 

9 

9 

w 

D 

- 

| 

Software  development  K.6.3 

4 

5 

5 

u 

2 

mi 

1 

— 

Software  management  K.6.3 

| 

- 

D 

- 

- 

2t 

- 

DM 

Structured  programming  D.2.2 

|| 

- 

- 

MM 

- 

- 

Testing  and  debugging  D.2.5 

nni 

m 

D 

4t 

15 

lit 

H 

Dl 

Dl 

Tools  and  techniques  D.2.2 

5 

2 

3 

1 

2t 

User  interfaces  D.2.2 

i 

1 

1M 

m 

5 

8t 

5t 

9t 

5 

User  interfaces  H.5.2 

# 

mm 

# 

8t 

# 

# 

User/machine  systems  H.1.2 

■HI 

n 

2 

■I 

- 

- 

9t 

Windows  D.2.2 

- 

- 

1 

- 

|| 

- 

ppi 

#  =  Node  not  in  CCS  for  period. 

t  =  Tie  for  ranked  position.  Ties  for  position  n  all  ranked  as  n;  next  ranked  position  begins  at  n+m,  where  m 
-  =  Not  in  highest  15  for  period. 

is  number 

- 1_ 

3f  ties  rank 

Some  common  themes  also  emerge  from  the  descriptor  data.  The  following  themes  appear 
consistently  and  repeatedly:  tools  and  techniques,  user  interfaces,  programming  environ- 


44 


CMU/SEI-95-TR-01 9 


merits,  reusable  software,  design  methodologies,  software  management  and  development, 
testing  and  debugging,  verification,  metrics,  Ada,  and  requirements/specifications.  Some  new 
descriptors  are  prominent  in  1991-1994  data,  including  computer-aided  software  engineering, 
object-oriented  programming,  and  Windows. 

Only  the  following  25  descriptors  appeared  in  all  time  periods  (not  just  among  the  15  most 
common  by  period).13 

•  Ada  D.3.2 

•  Concurrent  programming  D.1.3 

•  Curriculum  K.3.2 

•  Design  D.2.10 

•  General  D.2.0 

•  Human  factors  H.1.2 

•  Interaction  techniques  1.3.6 

•  Introductory  and  survey  A.1 

•  Management  D.2.9 

•  Mathematical  software  G.4 

•  Methodologies  D.2.1 

•  Metrics  D.2.8 

•  Program  verification  D.2.4 

•  Programming  environments  D.2.6 

•  Requirements/specifications  D.2.1 

•  Software  development  K.6.3 

•  Software  libraries  D.2.2 

•  Software  management  K.6.3 

•  Software  quality  assurance  (sqa)  D.2.9 

•  Specification  techniques  F.3.1 

•  Specifying  and  verifying  and  reasoning  about  programs  F.3.1 

•  Testing  and  debugging  D.2.5 

•  Tools  and  techniques  D.2.2 

•  User  interfaces  D.2.2 

•  User/machine  systems  H.1.2 

Another  way  to  see  the  filtering  effect  of  the  algorithm  is  to  count  the  descriptors  in  each  major 


13. 


Recall  that  new  CCS  descriptors  created  in  1987  and  1991  are  not  candidates  for  appearance  in  preceding 
time  penods.  Hence,  some  descriptors  that  are  now  commonly  used,  such  as  object-oriented  programming 
D.  1 .5,  could  not  appear  in  this  list.  a  y 
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CCS  category  in  the  original  data  and  compare  that  number  to  the  ones  that  emerged  as  net¬ 
work  nodes.Table  14  gives  the  percentages  of  the  57,727  descriptors  in  the  original  data  by 
CCS  category  (first  column),  the  percentages  of  the  1,606  unique  descriptors  in  the  original 
data  by  CCS  category  (second  column),  and  the  percentages  by  CCS  category  of  the  158  de¬ 
scriptors  that  passed  the  co-word  analysis  filter  to  reach  the  resulting  42  networks. 


Table  14:  CCS  Descriptor  Summary  Data 


CCS  Category 

All  Descriptors 
(57,725) 

Unique 

Descriptors 

(1,606) 

Network 

Descriptors 

(158) 

A-General  Literature 

0.6% 

0.4% 

1.3% 

B -Hardware 

1.1% 

7.5% 

0% 

C-Computer  Systems  Organization 

4.4% 

8.5% 

3.2% 

D-Software 

59.1% 

(40.1%,  in  D.2) 

31.9% 

(11.2%  in  D.2) 

58.3% 

(29.1%  in  D.2) 

E-Data 

0.6% 

1.8% 

0% 

F-Theory  of  Computation 

4.5% 

5.2% 

6.3% 

G-Mathematics  of  Computing 

1.7% 

5.3% 

1.9% 

H-Information  Systems 

8.9% 

11.6% 

8.9% 

I-Computing  Methodologies 

7.6% 

16.1% 

12.0% 

J-Computer  Applications 

2.5% 

3.6% 

1.3% 

K-Computing  Milieux 

8.9% 

8.1% 

11.3% 

The  hardware  and  data  CCS  categories  were  not  represented  at  all  in  the  networks;  and  the 
general  literature,  computer  systems  organization,  mathematics  of  computing,  and  computer 
applications  categories  were  only  marginally  included.  The  theory  of  computation  category 
was  included  primarily  with  respect  to  program  verification. 

Listed  below  are  the  8  non-D.2  descriptors  included  among  the  15  most  frequent  descriptors 
in  networks  (Table  13).  These  descriptors  highlight  interactions  among  D.2  descriptors  and 
other  descriptors  in  CCS:  D.1.5  -  Object-Oriented  Programming. 

D.3.2  -  Ada 

H.1.2  -  Human  factors 

H.1.2  -  User/machine  systems 

H. 5.2  -  User  interfaces 

I. 2.1  -  Applications  and  expert  systems 
K.6.3  -  Software  development 

K.6.3  -  Software  management 
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8.2  Findings 

Based  on  our  analyses,  it  appears  that  much  of  software  engineering's  intersection  with  the 
rest  of  computing  is  in  the  areas  of  user  interaction,  software  management,  and  programming 
methodology.  Very  little  interaction  with  hardware,  data,  mathematics  of  computing,  and  com¬ 
puter  applications  is  evident.  Further  analyses  will  reinforce  this  hypothesis. 

Just  as  important,  our  analysis  of  the  CCS  descriptors  shows  that  some  D.2  descriptors  play 
a  less  important  role  than  implied  in  several  software  engineering  definitions  [Naur  69],  [Boe¬ 
hm  76],  [Zelkowitz  78],  [Fairley  85],  [Humphrey  89],  [Shaw  90],  [Denning  92],  [IEEE  89].  These 
definitions  normally  incorporate  terms  such  as  large-scale,  economical,  managerial,  interdis¬ 
ciplinary,  production,  maintenance,  reliable,  dependable,  efficient,  safety,  design,  and  specifi¬ 
cations.  We  see  some  of  these  themes  in  our  findings,  but  not  all  of  them. 

Human  factors  is  a  consistent  and  important  theme  in  all  periods  we  analyzed.  This  is  contrary 
to  other  attempts  to  define  software  engineering  where  human  factors  is  often  deemed  mar¬ 
ginal.  Conversely,  economic  aspects  are  mentioned  consistently  in  these  other  discussions. 
However,  we  found  little  on  that  subject  in  the  research  and  development  literature,  even 
though  descriptors  under  (D.2.9)  Management  -  Cost  Estimation,  and  Management  -  Time  Es¬ 
timation,  as  well  as  (K.6.0)  General,  Economics  were  available.  Over  the  three  time  periods 
analyzed  here,  these  descriptors  appeared  in  the  unfiltered  data  117  times,  15  times,  and  33 

times,  respectively,  but  did  not  associate  strongly  enough  with  other  descriptors  to  be  placed 
in  any  networks. 

Also,  we  find  little  evidence  of  a  maturing  profession  as  judged  by  commentary  on  issues  such 
as  ethics,  licensing,  certification,  human  safety,  and  codes  of  good  practice,  even  though  ap¬ 
propriate  CCS  nodes  are  defined.  None  of  these  nodes  reached  the  networks,  and  only  min¬ 
imal  inclusion  was  found  in  the  almost  58,000  total,  unfiltered  descriptors.  While  the  standards 
descriptors  were  included  in  the  first  two  generations  of  networks  (but  in  isolated  fashions) 
they  did  not  appear  in  1 991  -1 994  networks.  As  stated  by  Shaw  [Shaw  90],  an  engineering  dis¬ 
cipline  of  software  is  still  in  the  early  stages  of  development. 
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Conclusions 


9.1  Methodology 

This  study  demonstrates  the  feasibility  of  co-word  analysis  as  a  viable  approach  for  extracting 
patterns  from  and  identifying  trends  in  large  corpora  where  the  texts  collected  are  from  the 
same  subdomain  and  divided  into  roughly  equivalent  quantities  for  different  time  periods.  This 
methodology  has  also  been  used  in  other  studies  at  the  Software  Engineering  Institute  as  a 
way  of  filtering  risk  information  collected  at  external  sites  [Monarch  95]  and  for  differentiating 
process  assessments  of  external  sites-those  that  showed  an  improvement  from  those  that  did 
not-with  respect  to  thematic  concerns.  Moreover,  the  Software  Engineering  Risk  Repository 
(SERR),  an  information  retrieval  system  containing  risk  and  risk  mitigation  information  from 
over  35  software  risk  assessments,  uses  term  co-occurrence  networks  for  suggesting  related 
terms  to  those  found  in  a  user’s  query  [Monarch  96].  The  system  is  currently  being  user  tested. 


9.2  Findings 

What  can  we  conclude  about  the  state  of  software  engineering  based  on  our  study  of  publica¬ 
tions?  First,  the  field  is  rapidly  evolving  as  is  demonstrated  by  the  changing  descriptors  in  net¬ 
works,  the  changing  connections  in  super  networks,  and  the  changing  centrality/density 
scores.  The  analysis  of  the  1 991  -1 994  data  shows  a  trend  towards  focusing  on  object-oriented 
themes,  software  reuse/software  library  themes,  and  user  interface  themes.  Consistent 
themes  are  evident  over  the  time  periods  studied,  although  contexts  change.  Some  consistent 
themes  are  user  interfaces,  tools  and  techniques,  verification  and  validation,  software  reuse, 
requirements  and  specifications,  and  design  methodologies. 

9.2.1  The  Role  of  Software  Tools 

The  core  themes  of  user  interfaces  and  software  development  (with  object-oriented  methods) 
both  display  underlying  principles  (such  as  screen  design,  design  methodologies,  reusable 
software,  and  so  forth)  together  with  software  tools  that  embody  some  of  these  underlying 
principles.  These  tools  include  X-Windows,  Microsoft  Windows,  Ada,  C++,  and  UNIX.  CASE 
tools  are  prominent  in  software  development  networks,  but  names  of  specific  CASE  tools  are 
not  present.  This  observation  suggests  that  the  maturity  of  a  software  engineering  subfield  can 
be  gauged  by  the  maturity  of  relevant  supporting  tools.  Earlier  we  observed  that  the  languages 
Pascal,  Basic,  and  Cobol  dropped  from  the  software  engineering  descriptors,  along  with  pro- 
gramming-in-the-small  issues  such  as  structured  programming.  They  were  replaced  by  pro- 
gramming-in-the-large  issues  and  by  a  different  set  of  supporting  tools  appropriate  for  large- 
scale  software  development  environments.  As  software  engineering  matures,  we  can  expect 
to  observe  the  names  of  other  specific  software  tools  and  systems,  and  we  may  see  new  core 
areas  emerge  as  supporting  tools  are  refined. 

Because  CCS  is  a  fixed  taxonomy  with  periodic  updates  to  descriptors,  the  role  of  implicit  sub¬ 
ject  descriptors  may  be  crucial  in  observing  trends  between  and  across  updates  to  the  classi- 
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fication  system.  Therefore,  names  of  languages  and  systems  as  reflected  in  CCS  descriptors 
provide  numerous  insights  into  observing  a  field’s  maturation. 

9.2.2  Software  Engineering  and  Computer  Science 

What  is  the  relationship  between  software  engineering  and  computer  science?  We  know  of  no 
comparable  study  of  computer  science  terminology,  so  a  comparison  is  difficult,  but  some  ob¬ 
servations  are  apparent.  The  latest  detailed  curriculum  model  for  computing  [Denning  89],  list¬ 
ed  the  nine  subareas  of  computing  as  algorithms  and  data  structures,  programming 
languages,  architecture,  numerical  and  symbolic  computation,  operating  systems,  software 
engineering  and  methodology,  database  and  information  retrieval,  artificial  intelligence  and  ro¬ 
botics,  and  human-computer  communication.  These  areas  are  not  meant  to  be  independent, 
of  course. 

As  shown  by  its  descriptor  networks,  software  engineering  incorporates  topics  from  most  of 
these  areas,  but  it  stands  alone  in  its  emphasis  on  management,  process,  design,  testing, 
specifications,  and  other  fundamental  engineering  terms.  It  fits  the  fundamental  engineering 
paradigm  better  than  it  fits  the  mathematics  or  experimental  science  paradigms  [Denning  89], 

Software  engineering  certainly  draws  from  computer  science  theories,  but  it  also  depends 
heavily  on  theories  from  management,  psychology,  mathematics,  and  other  related  fields.  We 
feel  it  is  emerging  as  a  discipline  in  computing  rooted  in  computer  science,  but  with  its  own 
character  and  content. 

9.2.3  Limitations  of  This  Study 

This  study  is  based  exclusively  on  refined  publications,  so  it  represents  topics  that  are  more 
developed  than  some  others.  Surely,  there  is  much  activity  in  cost/time  estimation,  manage¬ 
ment  of  programming  teams,  and  other  important  but  relatively  immature  areas.  The  lag  time 
from  the  invention  of  software  technology  until  its  acceptance  into  common  practice  is  estimat¬ 
ed  at  15-20  years  [Redwine  84],  so  this  gap  is  not  surprising.  Also  while  CCS  provides  the 
proper  focus  for  this  study,  it  may  have  limitations  with  respect  to  more  detailed  studies  of  soft¬ 
ware  engineering  trends  because  of  its  fixed  taxonomy.  Applying  co-word  analysis  to  author- 
defined  descriptors,  to  abstracts,  or  to  a  document’s  text  may  reveal  observations  complemen¬ 
tary  to  the  ones  we  noted. 
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Appendix:  Maps  of  All  Networks 

Following  are  maps  of  all  42  networks  generated  by  the  co-word  analysis  used  in  this  study. 
These  images  were  captured  directly  from  the  output  of  a  graphical  user  interface  and  are  pre¬ 
sented  in  that  form.  Corresponding  maps  in  the  body  of  the  paper  were  reconstructed  to  en¬ 
hance  readability.  To  facilitate  automatic  processing  of  networks,  CCS  descriptors  and  node 
codes  were  appended  in  the  original  maps.  In  the  following  maps,  nodes  such  as  “metricsd2.8” 
should  be  interpreted  as  “Metrics  D.2.8.” 

Pass-1  descriptors  are  enclosed  by  thick  boxes;  while  Pass-2  descriptors  are  enclosed  by  thin 
boxes.  Pass-1  links  are  shown  by  thick  lines,  Pass-2  links  are  shown  by  thin  lines.  Hashed 
lines  indicate  two  Pass-1  nodes  linked  during  Pass-2;  recall  such  links  are  treated  as  Pass-1 
links  because  they  join  two  Pass-1  nodes. 
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A„1  1982-1986  Maps  of  15  Networks 


ap-l: 1982-1986 


File  Options 

software  developmentk.6.3  [  testing  and  debuggingd.2.5  ] 


systems  analysis  and  designk.6.1 


corrections  d.2  7 


managementd.2.9 


adad.3.3 


adad.2.6  |— "  |adad.3.2  |— |  software  managementk.6.3 


software  maintenancek.6.3 


distribution  and  maintenanced.27 


software  psychologyd.m 


concurrent  programming  structures d.3 .3 


generald.2.0  |  |life  cyded.2.9  |  |metricsd.2.8 


Figure  A.1-1 :  Software  Management  -  Ada 
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Z..  Z: . .  nap-2: 1 982-1 9 86 

[File  Options 

Help  ] 

logic  programming. 4.1 


logic  programrningi.2.3 


582—1986:  Ivl  ap  2  —  Logic  Programmi 


Figure  A.  1-2:  Logic  Programming 
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Figure  A.  1-4:  Standards 
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Figure  A.1-7:  Software  Libraries 
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Figure  A.1-9:  Reliability 
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Figure  A.1-1 1 :  Requirements/Specifications  -  Systems  Analysis  and  Design 
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Figure  A.1  -1 4:  Abstract  Data  Types 
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Figure  A.2-3:  Logic  Programming 
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Figure  A.2-7:  Standards 
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Figure  A.2-1 5:  Reusable  Software 


iap-16 : 1987-1990 


File  Options 


Figure  A.2-1 6:  Software  Management  -  Design 


CM  U/SEI-95-TR-01 9 


A.3  1 991  -1 994  Maps  of  1 1  Networks 


Figure  A.3-1 :  User  Interfaces 


Figure  A.3-2:  Petri  Nets 


72 


CMU/SEI-95-TR-01 9 


Figure  A.3-3:  Software  Development  -  Object-Oriented  Programming 
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Figure  A.3-7:  Tools  and  Techniques  -  CASE  -  Systems  Analysis  and  Design 
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